CMP Zip Code Analysis

Or: Where Do Our Guests Come From?

Author

Learning & Research

The Report

Summary

Where The Data Comes From

With everybody’s help we got access to three sources of zip code data:

  • CMP membership as of 2023-04-18

    • ~1,500 datum

    • A snapshot as of 2023-04-18 of every membership with a registered zip code

  • Exit Survey responses from (2021-11-23 to 2023-03-28)

    • ~750 datum

    • Sales for an entire group who responded to an exit survey

      • Also covers a bevy of additional information that can be utilized alongside this
  • Sales Data (2021-06-01 to 2023-04-30)

    • ~80,000 datum

    • Aggregated total admissions from each zip code in the time frame

What We Have Done

We’ve cleaned and aggregated the data all the zip code data sources into a single data frame, summarized it with some basic metric analysis. Additionally we gathered zip code data from the US Census combined that into the data frame we made. The metrics we used were:

  • Count (“How many in each zip code?”)

  • Percent of Sales (“What proportion of our sales does this zip code account for?”)

  • Percent of Population Served (“How much of the population of that zip code have we sold to?”)

  • Density (“How many people per square mile are in that zip code?”)

You can see these in the data table and the plots provided. Using these calculated metrics and some descriptive ones we’ve also narrowed down some probable locations where marketing might be most effective.

Some of the data is inflated because of the use of ‘15212’ as ‘unknown’ in that data set. We tried to reduce it down to what we think is accurate (based of percentage of responses in the exit survey) but we cannot be sure. By adding in the exit survey and membership data we are adding a slight weight to those we know are accurate.

The Results

From Far to Near

We have guests come in from 43 states (Alaska and Hawaii not shown) and the District of Columbia! Of course the vast majority came from Pennsylvania or our neighboring states (West Virginia, Virginia, Ohio, and New York).

In Pennsylvania we hit 48 out of 67 counties! PA Guests represent 96% of our total guest pool. Those within a one hour drive of CMP represent 98% of our guests.

Looking Forward

What We Need To Do In The Future

In the end the most important thing we need to do is change how we enter in zip code information at the front desk for guests who don’t provide one. Currently the go-to is to enter in “15212” (CMP’s zip code); that presents a problem because we no longer have an accurate representation of who is coming to the Museum from the North Side and that is the most important region we need to have an accurate count for. If we are under-serving the people where this Museum resides we need to remedy that.

In talking to some of the floor people for visitor services and looking over the sales data sent to me it seems like having “00000” be the “no zip code provided” code is the best option for the following reasons:

  • It is easy to remember and enter for staff.

  • It corresponds to no geographic area and is easy to parse out when needed.

  • We already know the reporting system can handle it in a preferable way because I have seen it in the data I received.

  • There were also so little entered into the system that an accidental entering of it is unlikely (Only 2 admissions attributed to it out of 171621 admissions total in the uncleaned sales data)

We need the sales data to be the ground truth here and so we need the most accurate representation we can. If we could 100% trust the data coming off of sales data we can see if and how membership and exit survey respondents are skewed from who is coming in.

What We Can Do In The Future

  • Coordinate with Marketing to see how geographically-bound ads effected guest turnout from the region (and other metrics)

  • See how representative we are to the specific areas we serve and how our exit survey & membership data are skewed (requires accurate reporting of all zip codes)

  • Incorporate estimated demographics (race/ethnicity and household)

  • With updating information we can map how our guest turnout changes over time

  • Additionally I can set up a pipeline to have the data be easy to update from each team and show the analysis without needing to redo most of this work every time – or without necessarily needing outside help (i.e. me) every time.

Explorable Data

In the interest of transparency I’ve also included explorable data sets for both the plot and the cleaned/summarized table! These are only viewable if you download the file and open it in your browser instead of looking at it through Sharepoint, Teams, or however else it has been shared with you.

Below you’ll find an explorable map with all the data we have for the state of Pennsylvania. You’ll be able to zoom in and out of the map as well as move across the state. Clicking any of the highlighted zip code regions will bring up some of the information we have on it. The shading of the region denotes the percentage of the population served. Regions with red circles in them denote one of the recommend areas.

And now below we have the full summarized & cleaned data set. You’ll be able to sort, filter, and paginate through it to your heart’s content. You’ll also be able to download this as a raw csv or excel worksheet and do your own playing with the data if you wish!

Any additional comments, questions, concerns, or follow-ups you wish to see can be sent to Nour al-Zaghloul at L&R (nal-zaghloul@pittsburghkids.org) who will be happy to talk to you about any of the above!